Picture for Zheng Zhu

Zheng Zhu

Tencent, WeChat Pay

SKIP: Sparse Keyframe Interpolation Paradigm for Efficient Embodied World Models

Add code
May 30, 2026
Viaarxiv icon

SAFE-Pruner: Semantic Attention-Guided Future-Aware Token Pruning for Efficient Vision-Language-Action Manipulation

Add code
May 28, 2026
Viaarxiv icon

StableIDM: Stabilizing Inverse Dynamics Model against Manipulator Truncation via Spatio-Temporal Refinement

Add code
Apr 20, 2026
Viaarxiv icon

ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment

Add code
Apr 12, 2026
Viaarxiv icon

VAG: Dual-Stream Video-Action Generation for Embodied Data Synthesis

Add code
Apr 10, 2026
Viaarxiv icon

ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video

Add code
Apr 09, 2026
Viaarxiv icon

ViVa: A Video-Generative Value Model for Robot Reinforcement Learning

Add code
Apr 09, 2026
Viaarxiv icon

DriveDreamer-Policy: A Geometry-Grounded World-Action Model for Unified Generation and Planning

Add code
Apr 02, 2026
Viaarxiv icon

FlashSign: Pose-Free Guidance for Efficient Sign Language Video Generation

Add code
Mar 30, 2026
Viaarxiv icon

Vega: Learning to Drive with Natural Language Instructions

Add code
Mar 26, 2026
Viaarxiv icon